Sentiment analysis of Arabic tweets using supervised machine learning (in English)
Annotation
The increasing volume of user-generated content on social media platforms necessitates effective tools for understanding public sentiment. This study presents an approach to sentiment analysis of Arabic tweets using supervised machine learning techniques. We explored the performance of three popular algorithms — Support Vector Machines (SVM), Naive Bayes (NB), and Logistic Regression (LR) — on two distinct corpora: the Arabic Sentiment Text Corpus (ASTC) and a dataset of Arabic tweets. Our methodology involved four tests assessing the impact of corpus characteristics, preprocessing techniques, weighting methods, and the use of N-grams on classification accuracy. The first test established that the choice of corpus significantly influences model performance, with SVM showing superior accuracy on the structured ASTC, while NB excelled with the informal Arabic tweets. In the second test, preprocessing steps, including the removal of punctuation and stop-words, led to a noticeable improvement in classification accuracy for the Arabic tweets but had minimal or even negative effects on the ASTC. The third test indicated that incorporating N-grams yielded modest improvements for NB and LR in more structured texts, while its impact on tweets was negligible. Finally, the fourth test compared different weighting techniques, revealing that SVM benefitted from the Term Frequency-Inverse Document Frequency weighting method, while NB performance remained stable regardless of the weighting approach. These findings underscore the importance of tailoring preprocessing and feature extraction strategies to the specific characteristics of the dataset, ultimately enhancing the accuracy of sentiment analysis in Arabic language contexts
Keywords
Постоянный URL
Articles in current issue
- Design and fabrication of collimating ball lensed fiber for the system of optical radiation output from radiophotonic components
- From Triassic to moderinity: Raman spectroscopy for differentiation of fossil resins age by age
- Optimization of geometry of two-dimensional photonic crystal waveguide for telecommunications and sensorics
- Development and investigation of the suppressing additive noises methods in fiber-optic interferometric sensors
- Method for compensating the constant component of noise in the reflectogram of a fiber-optic communication line under conditions of insufficient dynamic range of an optical backscatter reflectometer in the time domain
- Investigation of the method of moving object weight measurement based onquasidistributed fiber Bragg gratings with temperature compensation
- Modern optical methods of non-contact geometric measurements and reconstruction of object 3D surface shape: a review
- Spectral-luminescent properties of silver clusters Ag1–5 in the ion-exchange layer of silicate glass
- Forming a thick layer of ε-Ga2O3 on the GaN sublayer with V-defects at the interface
- A model for ensuring the continuity of the safe functioning of the product quality traceability system in conditions of unstable communication
- Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs
- Surface defect detection with limited data based on SSD detector and Siamese networks
- Russian parametric corpus RuParam
- Comparative analysis of AI-generated and original abstracts of academic articles on philology
- Enhancing Kubernetes security with machine learning: а proactive approach to anomaly detection
- Prompt-based multi-task learning for robust text retrieval
- Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data
- Specification language for automatа-based objects cooperation
- Aspects of organizing game interactions among asymmetric agents using graph neural networks
- Development and modeling of technological scheme of steam methane reforming with oxy-fuel combustion and carbon capture
- Stability study of hybrid MOS memristor memory using modified particle swarm optimization method
- Analysis of the vulnerability of YOLO neural network models to the Fast Sign Gradient Method attack